-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust, python): include null count in rolling window validity with min_periods
#13863
Conversation
@@ -86,7 +93,7 @@ def test_rolling_map_sum_int_cast_to_float() -> None: | |||
function=lambda s: s.sum(), window_size=3, weights=[1.0, 2.0, 3.0] | |||
) | |||
|
|||
expected = pl.Series("A", [None, None, 32.0, 20.0, 48.0], dtype=pl.Float64) | |||
expected = pl.Series("A", [None, None, 32.0, None, None], dtype=pl.Float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was previously incorrect, as it did not take into account the null values in the window.
Some(5.0), | ||
Some(11.0) | ||
] | ||
&[None, None, Some(3.0), None, None, None, None,] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was also previously incorrect.
@@ -111,6 +111,12 @@ mod inner_mod { | |||
// we are in bounds | |||
let arr_window = unsafe { arr.slice_typed_unchecked(start, size) }; | |||
|
|||
// ensure we still meet window size criteria after removing null values | |||
if size - arr_window.null_count() < options.min_periods { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that the null count is more efficiently calculated by only tracking the head and tail of each new window (i.e. decrementing if a null drops off and incrementing if a new null pops up), but because this is the slow rolling_map
this is almost guaranteed not to be a bottleneck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah.. Indeed. This might be the worst code in the repo. :(
Resolves #13856.